AITopics | efficient reinforcement learning

Collaborating Authors

efficient reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sequential Preference Ranking for Efficient Reinforcement Learning from Human Feedback

Neural Information Processing SystemsDec-26-2025, 10:09:00 GMT

However, existing RLHF models are considered inefficient as they produce only a single preference data from each human feedback. To tackle this problem, we propose a novel RLHF framework called SeqRank, that uses sequential preference ranking to enhance the feedback efficiency. Our method samples trajectories in a sequential manner by iteratively selecting a defender from the set of previously chosen trajectories $\mathcal{K}$ and a challenger from the set of unchosen trajectories $\mathcal{U}\setminus\mathcal{K}$, where $\mathcal{U}$ is the replay buffer. We propose two trajectory comparison methods with different defender sampling strategies: (1) sequential pairwise comparison that selects the most recent trajectory and (2) root pairwise comparison that selects the most preferred trajectory from $\mathcal{K}$. We construct a data structure and rank trajectories by preference to augment additional queries. The proposed method results in at least 39.2% higher average feedback efficiency than the baseline and also achieves a balance between feedback efficiency and data dependency. We examine the convergence of the empirical risk and the generalization bound of the reward model with Rademacher complexity. While both trajectory comparison methods outperform conventional pairwise comparison, root pairwise comparison improves the average reward in locomotion tasks and the average success rate in manipulation tasks by 29.0% and 25.0%, respectively. The source code and the videos are provided in the supplementary material.

efficient reinforcement learning, pairwise comparison, sequential preference ranking, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

(More) Efficient Reinforcement Learning via Posterior Sampling

Neural Information Processing SystemsSep-30-2025, 11:52:43 GMT

Most provably efficient learning algorithms introduce optimism about poorly-understood states and actions to encourage exploration. We study an alternative approach for efficient exploration, posterior sampling for reinforcement learning (PSRL). This algorithm proceeds in repeated episodes of known duration. At the start of each episode, PSRL updates a prior distribution over Markov decision processes and takes one sample from this posterior. PSRL then follows the policy that is optimal for this sample during the episode. The algorithm is conceptually simple, computationally efficient and allows an agent to encode prior knowledge in a natural way. We establish an $\tilde{O}(\tau S \sqrt{AT})$ bound on the expected regret, where $T$ is time, $\tau$ is the episode length and $S$ and $A$ are the cardinalities of the state and action spaces. This bound is one of the first for an algorithm not based on optimism and close to the state of the art for any reinforcement learning algorithm. We show through simulation that PSRL significantly outperforms existing algorithms with similar regret bounds.

artificial intelligence, efficient reinforcement learning, machine learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Efficient Reinforcement Learning by Discovering Neural Pathways

Neural Information Processing SystemsMay-26-2025, 18:33:58 GMT

Reinforcement learning (RL) algorithms have been very successful at tackling complex control problems, such as AlphaGo or fusion control. However, current research mainly emphasizes solution quality, often achieved by using large models trained on large amounts of data, and does not account for the financial, environmental, and societal costs associated with developing and deploying such models. Modern neural networks are often overparameterized and a significant number of parameters can be pruned without meaningful loss in performance, resulting in more efficient use of the model's capacity lottery ticket. We present a methodology for identifying sub-networks within a larger network in reinforcement learning (RL). We call such sub-networks, neural pathways.

artificial intelligence, efficient reinforcement learning, machine learning, (1 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Sequential Preference Ranking for Efficient Reinforcement Learning from Human Feedback

Neural Information Processing SystemsJan-19-2025, 16:23:56 GMT

efficient reinforcement learning, pairwise comparison, sequential preference ranking, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Efficient Reinforcement Learning for Optimal Control with Natural Images

Loxley, Peter N.

arXiv.org Artificial IntelligenceDec-11-2024

Reinforcement learning solves optimal control and sequential decision problems widely found in control systems engineering, robotics, and artificial intelligence. This work investigates optimal control over a sequence of natural images. The problem is formalized, and general conditions are derived for an image to be sufficient for implementing an optimal policy. Reinforcement learning is shown to be efficient only for certain types of image representations. This is demonstrated by developing a reinforcement learning benchmark that scales easily with number of states and length of horizon, and has optimal policies that are easily distinguished from suboptimal policies. Image representations given by overcomplete sparse codes are found to be computationally efficient for optimal control, using fewer computational resources to learn and evaluate optimal policies. For natural images of fixed size, representing each image as an overcomplete sparse code in a linear network is shown to increase network storage capacity by orders of magnitude beyond that possible for any complete code, allowing larger tasks with many more states to be solved. Sparse codes can be generated by devices with low energy requirements and low computational overhead.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2412.08893

Country: Oceania > Australia (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

Neural Information Processing SystemsMar-14-2024, 14:41:45 GMT

We study the problem of adaptive control of a high dimensional linear quadratic (LQ) system. Previous work established the asymptotic convergence to an optimal controller for various adaptive control schemes. More recently, for the average cost LQ problem, a regret bound of O( T) was shown, apart form logarithmic factors.

algorithm, controller, probability, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Stanford (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Add feedback

Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

Neural Information Processing SystemsFeb-16-2024, 07:21:24 GMT

We study the problem of adaptive control of a high dimensional linear quadratic (LQ) system. Previous work established the asymptotic convergence to an optimal controller for various adaptive control schemes. More recently, an asymptotic regret bound of \tilde{O}(\sqrt{T}) was shown for T \gg p where p is the dimension of the state space. In this work we consider the case where the matrices describing the dynamic of the LQ system are sparse and their dimensions are large. We present an adaptive control scheme that for p \gg 1 and T \gg \polylog(p) achieves a regret bound of \tilde{O}(p \sqrt{T}) .

adaptive control scheme, efficient reinforcement learning, high dimensional linear quadratic system, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems

Jali, Neharika, Qu, Guannan, Wang, Weina, Joshi, Gauri

arXiv.org Artificial IntelligenceFeb-2-2024

We consider the problem of efficiently routing jobs that arrive into a central queue to a system of heterogeneous servers. Unlike homogeneous systems, a threshold policy, that routes jobs to the slow server(s) when the queue length exceeds a certain threshold, is known to be optimal for the one-fast-one-slow two-server system. But an optimal policy for the multi-server system is unknown and non-trivial to find. While Reinforcement Learning (RL) has been recognized to have great potential for learning policies in such cases, our problem has an exponentially large state space size, rendering standard RL inefficient. In this work, we propose ACHQ, an efficient policy gradient based algorithm with a low dimensional soft threshold policy parameterization that leverages the underlying queueing structure. We provide stationary-point convergence guarantees for the general case and despite the low-dimensional parameterization prove that ACHQ converges to an approximate global optimum for the special case of two servers. Simulations demonstrate an improvement in expected response time of up to ~30% over the greedy policy that routes to the fastest available server.

algorithm, server, threshold policy, (13 more...)

arXiv.org Artificial Intelligence

2402.01147

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Spain (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Scilab-RL: A software framework for efficient reinforcement learning and cognitive modeling research

Dohmen, Jan, Röder, Frank, Eppe, Manfred

arXiv.org Artificial IntelligenceJan-25-2024

One problem with researching cognitive modeling and reinforcement learning (RL) is that researchers spend too much time on setting up an appropriate computational framework for their experiments. Many open source implementations of current RL algorithms exist, but there is a lack of a modular suite of tools combining different robotic simulators and platforms, data visualization, hyperparameter optimization, and baseline experiments. To address this problem, we present Scilab-RL, a software framework for efficient research in cognitive modeling and reinforcement learning for robotic agents. The framework focuses on goal-conditioned reinforcement learning using Stable Baselines 3 and the OpenAI gym interface. It enables native possibilities for experiment visualizations and hyperparameter optimization. We describe how these features enable researchers to conduct experiments with minimal time effort, thus maximizing research output.

scilab-rl, variance, visualization, (13 more...)

arXiv.org Artificial Intelligence

2401.14488

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Portugal > Braga > Braga (0.04)
Europe > Germany > Hamburg (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.51)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Filters

Collaborating Authors

efficient reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Sequential Preference Ranking for Efficient Reinforcement Learning from Human Feedback

(More) Efficient Reinforcement Learning via Posterior Sampling

Efficient Reinforcement Learning by Discovering Neural Pathways

Sequential Preference Ranking for Efficient Reinforcement Learning from Human Feedback

Efficient Reinforcement Learning for Optimal Control with Natural Images

Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

More Efficient Reinforcement Learning via Posterior Sampling

Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems

Scilab-RL: A software framework for efficient reinforcement learning and cognitive modeling research